Image by Fauxels from Pexels
The American Association of University Professors (AAUP) is a non-profit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011. The report begins with the following data visualisation.
In this lab you will discuss in your groups what makes a good data visualisation and create a better visualisation for the above data.
Please find your team members that you formed in last week’s workshop.
In today’s lab you will continue to work collaboratively from the same repository. An important learning objective for today is to be aware of merger conflicts and learn how to resolve them when pushing changes you your team’s shared repository.
As with last week, it is very important that you follow the instructions carefully and that you only selected team members do the activity at stated times.
Give each member of your team a number (it can be different from last week) and look-out for the following emoji sequence to indicate who should be completing the activity:
If it is not your turn, then advise your team member in doing their task. Do not make any changes to your work and do not make any pushes to or pulls from GitHub – Keep your hand off the keyboard!
Let’s first set-up GitHub.
😄🙂🙂🙂 (Member 1 only)
You are the maintainer of the GitHub repository for today’s lab worksheet. This means that you will need to take a clone of today’s lab template and to add your team members as collaborators so that they can add their contribution.
First, log onto GitHub and create a new repository by cloning today’s lab template project. To remind you of the step:
Go to Your repositories in your GitHub account and then click on the green New button.
Click on Import a repository and type/copy the URL of today’s lab template project: https://github.com/uoeIDS/lab-03-template
Add an appropriate name to your repository, say
lab-03, and click on Begin import.
Next, to add your team members as collaborators:
(🙂😄😄😄 - You should receive a collaboration invitation via email, accept this.)
In addition to the above set-up, it is important to write the team name and team members on the Wooclap this week. Although this information was collected last week, there was a large number of students that were missing - so we are collecting this data again! It is important to submit this information as it will be used to form the groups for your group project. Please help us on the collection of this information by …
Go to the main page of Wooclap
Use the event code to enter: XGXJOZ (new code for collecting names to get remaining groups and check inconsistencies)
Follow the instruction and given example to write your team name and team members.
Submit your answer in the Wooclap at the end
😄😄😄😄 (For all)
Once everyone has been added to the collaborative repository, open RStudio and create a new version control project using the GitHub repository you have just made. To remind you of the steps:
Open RStudio and go to File > New Project…
Select Version Control and then Git. Type/paste the URL of the repository you have just created.
Browse an appropriate location for the project and then click on Create Project.
PAUSE: Ensure that all team members have successfully created an R project and have pulled the current content from GitHub. Everyone, hands off the computer unless it is your turn!
Please read this section before proceeding
What happens when you push your committed changes from your computer to a repository on GitHub?
It may appear that GitHub simply replaces the version it has with the latest version that that you have on your computer. This may not appear to be problematic when working by yourself, but there is a major issue when working collaboratively. Say that you and your friend are working collaboratively, your friend pushes their work first and then you push your work afterwards. If GitHub simply replaces old code with new code then you risk loosing your friend’s work!
What happens is that GitHub attempts to merge the existing and new files.
What actually happens is that there is an initial check to verify that version currently on GitHub matches with the version on your computer at the last time you communicated with GitHub (either via a pull or the previous push). If they are the same,then GitHub will happily replace the old files with the most version on your computer when you push the latest committed changes.
However, when working collaboratively, your team member may push their changes which would mean that your personal copy of the repository will be behind the version on GitHub.
In this case, GitHub will stop you from pushing your changes to the shared repository. When this happens, you will need to explicitly “merge” your work and your collaborator’s work before you can push.
If you and your collaborator’s changes are in different files or in different parts of the same file, your work will be automatically merged on your next ⬇️ pull from the shared repository. – This is what happened last week each time when you pulled the latest changes from your team’s repository.
However, if you and your collaborator has made changes to the same part of a file, then it is not possible to automatically merge the files. This is what is called a merge conflict as the merge procedure does not know which change you want to keep and which to overwrite. The decision to rectify the differences will have to be made by you.
When there is a merge conflict, additional conflict markers will appear in the file to indicate where the conflict is. This will look like:
The code <<< HEAD indicates the start of the
conflict and >>> identifies the end. The content
in the middle is partitioned by === to separate your
changes (top) from the latest version on GitHub
(bottom).
Your job is to reconcile the changes: edit the file so that
it incorporates the best of both versions, and then delete the conflict
markers (the <<<, ===, and
>>> lines).
Once you have reconciled the changes, you should then stage and commit the results. Only then will you will be permitted to push your changes to GitHub.
Whilst you are waiting for your turn, either help each other with their steps or look ahead to the next section on discussing data visualisations.
😄😄😄😄 (For all)
Open lab-03.Rmd. Type your own name at the top of the
file and 🧶 Kint the document. ✅ Commit your changes,
but do not push to GitHub!
Everyone, hands off your computer.
😄🙂🙂🙂 (Member 1 only)
Hands on your computer.
⬆️ Push your changes to GitHub. This should happen as usual with no issue.
Hands off your computer.
🙂😄🙂🙂 (Member 2 only)
Hands on your computer.
Attempt to ⬆️ push your changes to GitHub. This time you will see the message similar to:
This error message indicates that GitHub has failed to merge the changes that member 2 made with the changes made by member 1.
To resolve the merge conflict:
lab-03.Rmd and you should now see the
following at the top of your file that indicates where and what was the
merge conflict:🧶 Knit your document and verify that the author line in the output is correct.
✅ Commit the changes with an informative message, for
example resolved merger conflict with authors.
⬆️ Push your changes to GitHub. This time there should not be any issue.
Hands off your computer.
🙂🙂😄🙂 (Member 3 only)
Hands on your computer.
Follow the same instructions as member 2. Begin by attempting to ⬆️ Push your changes, which will result in an error message indicating that your personal version is behind the version on GitHub. Then:
Hands off your computer.
🙂🙂🙂😄 (Member 4 only)
Hands on your computer.
It is now your turn to follow the above procedure. First attempt to ⬆️ push your changes to get the error message. Then ⬇️ pull, find and resolve the merger conflict, 🧶 knit, ✅ commit and finally ⬆️ push your changes.
Hands off your computer.
😄🙂🙂🙂 (Member 1 only)
Although you have contributed your name to the author, you have not yet had the chance to experience a merger conflict – it is now your turn!
Hands on your computer.
Edit the author line of your version of lab-03.Rmd so
that it only has your name and your group’s team name. 🧶 Knit
the document and ✅ commit your changes.
If you now attempt to ⬇️ push you will be faced with an error message. Follow the above steps to resolve your merger conflict.
Hands off your computer.
😄😄😄😄 (For all)
Finally, everybody ⬇️ pull the latest changes from the shared repository. You all should now have the same document that has everyone’s name and your team name. Provided that you all followed the above instructions carefully then there should not be any further merger conflicts.
Git panel, look out for the message
Your branch is ahead of 'origin/master' by 1 commit. This
indicates that your personal version is ahead of the version on GitHub.
In this case, it is advisable to pull before you push
anything to minimise any communication errors.Look at the following data visualisations. Have a discussion with your team members at what might be problematic with the images. Do any of the visualisations have a problem with the 4 respects – people, data, mathematics and computer.
In your groups, take it in turn to work collaboratively in answering the following exercises.
Remember to regularly 🧶 knit, ✅ commit and ⬆️ push your work to your shared repository on GitHub. If you are faced with a merge conflict, then carefully follow the above instructions to reconcile the conflict before pushing your changes. If you come across an issue that you are unsure how to resolve, then please ask a tutor for assistance.
For the following exercises, you will be needing to use some of the
data wrangling functions from the tidyverse package and the
data visualisation code from the ggplot2 package. Ensure
that you have the following two lines of code at the top of
lab-03.Rmd to make the commands available to you.
Let’s start by loading the data from the AAUP that was used to create the data visualisation shown at the beginning of this worksheet.
View the data. Discuss as a team the following questions and write down your answer.
staff data wide or
long?When creating a data visualisation, it is generally preferable to have the data set in a long format. That is to say, each row should relate to a unique case/observation.
If the data set is in a wide format then we need to reshape
its structure by pivoting from wide to long using
pivot_longer(). The animation below show how this function
works, as well as its counterpart pivot_wider().
Quick reminder: the function has the following arguments:
data as usual.cols, specifies the columns to
pivot into longer format.names_to, is the name of the column
where column names of pivoted variables go (character string).values_to is the name of the
column where data in pivoted variables go (character string).Fill in the blanks in the following code chunk to pivot the staff
data longer and save it as a new data frame called
staff_long.
Inspect staff_long. How many rows does it have? Does
this correspond to your answer from Exercise 1?
We will begin by plotting instructional staff employment trends as a
dot plot. Copy the following code that creates a dot plot of
percentage on the y-axis against year on the
x-axis, with the dots coloured based on the faculty_type.
Ensure that you understand what each part of the code is doing.
Perhaps the trend over time can be better visualised using lines
rather than dots. Edit the above code to use the
geom_line() command.
What is wrong with the graph? Have a look at the data and the dot plot for clues as to what might be wrong before progressing to the next exercise. (You do not need to say how to fix it here—that is the next question!)
In the dot plot from exercise 3, notice that the scaling along the x-axis is not consistent. The physical distance between each of the years are the same, but numerically there are 14 years between the first two cases and 2 years between the last two!
The reason for this is because the year variable in
staff_long is a "character" variable, not a
numerical variable.
Complete the following code to edit the variable type of
year from character to numerical.
Now create the line plot described in exercise 4 to illustrate how the faculty proportions have changed over time.
Improve the line plot from the previous exercise by fixing up its labels (title, axis labels, and legend label) as well as any other components you think could benefit from improvement.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story? Write down your idea(s). The more precise you are, the easier the next step will be. Get creative, and think about how you can modify the dataset to give you new/different variables to work with.
Implement at least one of these ideas you came up with in the previous exercise. You should produce an improved data visualisation and accompany your visualisation with a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plot and why, and how you addressed them in the visualisation you created.
At the end of the lab, you need to ensure that you have your own personal copy of today’s work. Please follow the following instructions carefully:
😄😄😄😄 (For all)
Everybody, 🧶 knit, ✅ commit and ⬆️ push any remaining changes to your group’s shared repository on GitHub. In doing so, ensure that you resolve any merger conflicts.
Once the version on GitHub contains everybody’s contribution, ⬇️ Pull the latest changes so that your personal copy is up-to-date.
🙂😄😄😄 (All except member 1)
On GitHub, create your own copy of the shared repository. You can do this using the same instructions as at the start when copying today’s template repository, but instead importing from member 1’s GitHub account rather than the course account.
If you want to continue to work on today’s lab after the workshop, then you will need to create a new version control project with your personal copy of the repository that you have just created.
😄🙂🙂🙂 (Member 1 only)
At the end of the workshop, you want to ensure that only you can make further changes to the shared repository. To do this, you will need to remove the collaboration permissions of your team members. To do this: